Goto

Collaborating Authors

 action strategy


Mobile Manipulation Planning for Tabletop Rearrangement

arXiv.org Artificial Intelligence

Efficient tabletop rearrangement planning seeks to find high-quality solutions while minimizing total cost. However, the task is challenging due to object dependencies and limited buffer space for temporary placements. The complexity increases for mobile robots, which must navigate around the table with restricted access. A*-based methods yield high-quality solutions, but struggle to scale as the number of objects increases. Monte Carlo Tree Search (MCTS) has been introduced as an anytime algorithm, but its convergence speed to high-quality solutions remains slow. Previous work~\cite{strap2024} accelerated convergence but required the robot to move to the closest position to the object for each pick and place operation, leading to inefficiencies. To address these limitations, we extend the planner by introducing a more efficient strategy for mobile robots. Instead of selecting the nearest available location for each action, our approach allows multiple operations (e.g., pick-and-place) from a single standing position, reducing unnecessary movement. Additionally, we incorporate state re-exploration to further improve plan quality. Experimental results show that our planner outperforms existing planners both in terms of solution quality and planning time.


Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions

arXiv.org Artificial Intelligence

OpenAI recently introduces the groundbreaking o1 model [OpenAI, 2024, Zhong et al., 2024], renowned for its exceptional reasoning capabilities. This model has demonstrates outstanding performance on platforms such as AIME and CodeForces, surpassing other leading models. Inspired by this success, we aim to push the boundaries of LLMs even further, enhancing their reasoning abilities to tackle complex, real-world challenges. Inspired by OpenAI's o1, we aim to explore potential approaches to shed light on the currently unclear technical roadmap for large reasoning models (LRM). Marco-o1 leverages advanced techniques like CoT fine-tuning [Wei et al., 2022], MCTS [Wei et al., 2022, Feng et al., 2023, Silver et al., 2017], and Reasoning Action Strategies to enhance its reasoning power. As shown in Figure 2, by finetuning Qwen2-7B-Instruct [Yang et al., 2024] with a combination of the filtered Open-O1 CoT dataset [OpenO1 Team, 2024], Marco-o1 CoT dataset, and Marco-o1 Instruction dataset, Marco-o1 improves its handling of complex tasks.


SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search

arXiv.org Artificial Intelligence

Conversational Recommender Systems (CRS) proactively engage users in interactive dialogues to elicit user preferences and provide personalized recommendations. Existing methods train Reinforcement Learning (RL)-based agent with greedy action selection or sampling strategy, and may suffer from suboptimal conversational planning. To address this, we present a novel Monte Carlo Tree Search (MCTS)-based CRS framework SAPIENT. SAPIENT consists of a conversational agent (S-agent) and a conversational planner (S-planner). S-planner builds a conversational search tree with MCTS based on the initial actions proposed by S-agent to find conversation plans. The best conversation plans from S-planner are used to guide the training of S-agent, creating a self-training loop where S-agent can iteratively improve its capability for conversational planning. Furthermore, we propose an efficient variant SAPIENT-e for trade-off between training efficiency and performance. Extensive experiments on four benchmark datasets validate the effectiveness of our approach, showing that SAPIENT outperforms the state-of-the-art baselines.


CFR-MIX: Solving Imperfect Information Extensive-Form Games with Combinatorial Action Space

arXiv.org Artificial Intelligence

In many real-world scenarios, a team of agents coordinate with each other to compete against an opponent. The challenge of solving this type of game is that the team's joint action space grows exponentially with the number of agents, which results in the inefficiency of the existing algorithms, e.g., Counterfactual Regret Minimization (CFR). To address this problem, we propose a new framework of CFR: CFR-MIX. Firstly, we propose a new strategy representation that represents a joint action strategy using individual strategies of all agents and a consistency relationship to maintain the cooperation between agents. To compute the equilibrium with individual strategies under the CFR framework, we transform the consistency relationship between strategies to the consistency relationship between the cumulative regret values. Furthermore, we propose a novel decomposition method over cumulative regret values to guarantee the consistency relationship between the cumulative regret values. Finally, we introduce our new algorithm CFR-MIX which employs a mixing layer to estimate cumulative regret values of joint actions as a non-linear combination of cumulative regret values of individual actions. Experimental results show that CFR-MIX outperforms existing algorithms on various games significantly.


Towards Multi-agent Reinforcement Learning for Wireless Network Protocol Synthesis

arXiv.org Artificial Intelligence

This paper proposes a multi-agent reinforcement learning based medium access framework for wireless networks. The access problem is formulated as a Markov Decision Process (MDP), and solved using reinforcement learning with every network node acting as a distributed learning agent. The solution components are developed step by step, starting from a single-node access scenario in which a node agent incrementally learns to control MAC layer packet loads for reining in self-collisions. The strategy is then scaled up for multi-node fully-connected scenarios by using more elaborate reward structures. It also demonstrates preliminary feasibility for more general partially connected topologies. It is shown that by learning to adjust MAC layer transmission probabilities, the protocol is not only able to attain theoretical maximum throughput at an optimal load, but unlike classical approaches, it can also retain that maximum throughput at higher loading conditions. Additionally, the mechanism is agnostic to heterogeneous loading while preserving that feature. It is also shown that access priorities of the protocol across nodes can be parametrically adjusted. Finally, it is also shown that the online learning feature of reinforcement learning is able to make the protocol adapt to time-varying loading conditions.


Incremental Reinforcement Learning --- a New Continuous Reinforcement Learning Frame Based on Stochastic Differential Equation methods

arXiv.org Machine Learning

Continuous reinforcement learning such as DDPG and A3C are widely used in robot control and autonomous driving. However, both methods have theoretical weaknesses. While DDPG cannot control noises in the control process, A3C does not satisfy the continuity conditions under the Gaussian policy. To address these concerns, we propose a new continues reinforcement learning method based on stochastic differential equations and we call it Incremental Reinforcement Learning (IRL). This method not only guarantees the continuity of actions within any time interval, but controls the variance of actions in the training process. In addition, our method does not assume Markov control in agents' action control and allows agents to predict scene changes for action selection. With our method, agents no longer passively adapt to the environment. Instead, they positively interact with the environment for maximum rewards.


Perspectives on Artificial Intelligence Planning

AITopics Original Links

Planning is a key area in artificial intelligence. In its general form, planning is concerned with the automatic synthesis of action strategies (plans) from a description of actions, sensors, and goals. Planning thus contrasts with two other approaches to intelligent behavior: the programming approach, where action strategies are defined by hand, and the learning approach, where action strategies are inferred from experience. Different assumptions about the nature of actions, sensors, and costs lead to various forms of planning: planning with complete information and deterministic actions (classical planning), planning with non-deterministic actions and sensing, planning with temporal and concurrent actions, etc. Most work so far has been devoted to classical planning, where significant changes have taken place in the last few years.